feat: Support inline eval definitions by Copilot · Pull Request #43 · primer/agent-eval

Copilot · 2026-06-11T16:22:42Z

Experiments can now reference evals defined outside the repository’s generated eval registry. Inline evals include a name, project-local directory, and optional config/test path overrides that resolve from the CLI working directory.

Experiment config
- Allows evals entries to be either built-in eval IDs or inline eval objects.
- Adds shared eval config types for prompt metadata.
Eval resolution
- Resolves built-in eval IDs through the existing registry.
- Resolves inline eval paths relative to process.cwd().
- Supports inline config, configPath, and testPath.
- Preserves sandbox spoofing behavior by normalizing inline evals into the same runtime shape as generated evals.
Validation + docs
- Adds focused coverage for built-in lookup, cwd-relative inline paths, and custom config/test paths.
- Documents inline eval usage.

export const experiment: ExperimentConfig = {
  name: 'Local project experiment',
  description: 'Run an eval from the current project',
  models: ['gpt-5.5'],
  evals: [
    {
      name: 'local-button-eval',
      path: './evals/button',
      config: {
        prompt: 'Update the local project to use a Primer button',
      },
      testPath: 'button.eval.test.ts',
    },
  ],
  treatments: [],
}

joshblack

@copilot evals should use the same structure as what we have, folks shouldn't have to specify the test path or prompt inline it should live in the eval config in the eval itself

Copilot · 2026-06-30T15:07:41Z

@copilot evals should use the same structure as what we have, folks shouldn't have to specify the test path or prompt inline it should live in the eval config in the eval itself

Addressed in 4449adb by removing inline prompt/test path overrides and resolving inline evals from their eval.config.ts and eval.test.ts files.

joshblack · 2026-06-30T15:08:32Z

@copilot resolve the merge conflicts in this pull request

…inline-in-experiment # Conflicts: # packages/experiment/src/config.ts

Copilot · 2026-06-30T15:15:02Z

@copilot resolve the merge conflicts in this pull request

Resolved the merge conflicts in d76ea88 and repaired the merged pnpm lockfile in ef8c140.

joshblack · 2026-06-30T16:12:48Z

+  models: ['gpt-5.5'],
+  evals: [
+    {
+      name: 'local-button-eval',


@copilot name should be optional. By default, let's use the directory name (e.g. ./evals/local-button-eval should have the default name local-button-eval)

Addressed in 5cdc3b5 by making inline eval name optional and defaulting it from the eval directory name.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds support for referencing “inline” evals (local directories) in experiment configs, while standardizing type exports and import specifiers across packages.

Changes:

Extend ExperimentConfig.evals to accept either built-in eval IDs or inline eval directory references.
Add eval resolution logic in agent-eval to load inline eval config/tests from disk, and wire it into the CLI.
Normalize internal imports by removing explicit .ts extensions.

Show a summary per file

File	Description
packages/sandbox/src/index.ts	Normalizes relative import specifiers (drops `.ts`).
packages/experiment/src/index.ts	Re-exports new eval-related config types and normalizes import specifier.
packages/experiment/src/config.ts	Introduces `ExperimentEvalConfig` and updates `ExperimentConfig.evals` type accordingly.
packages/evals/src/index.ts	Normalizes generated module import specifiers (drops `.ts`).
packages/agent-eval/src/treatment.ts	Switches treatments to reference `ResolvedEval` instead of `Eval`.
packages/agent-eval/src/eval.ts	Adds resolver for built-in vs inline evals (filesystem validation + config import).
packages/agent-eval/src/eval.test.ts	Adds Vitest coverage for built-in and inline eval resolution behavior.
packages/agent-eval/src/config.ts	Reuses shared `EvalConfig` type from `@primer/agent-experiment`.
packages/agent-eval/src/cli.ts	Resolves evals upfront (built-in or inline) and uses resolved eval objects when constructing treatments.
README.md	Documents inline eval usage and expected file structure.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 10/10 changed files
Comments generated: 3
Review effort level: Low

+async function loadEvalConfig(configPath: string, name: string): Promise<EvalConfig> {
+  const configModule = (await import(configPath)) as {default?: unknown}
+  if (!isEvalConfig(configModule.default)) {
+    throw new Error(`Eval "${name}" config must export a default config with a prompt`)
+  }
+  return configModule.default


+async function assertDirectory(directory: string, name: string) {
+  const stats = await fs.stat(directory).catch(() => undefined)
+  if (!stats?.isDirectory()) {
+    throw new Error(`Eval "${name}" directory was not found: ${directory}`)
+  }
+}
+
+async function assertFile(filepath: string, name: string) {
+  const stats = await fs.stat(filepath).catch(() => undefined)
+  if (!stats?.isFile()) {
+    throw new Error(`Eval "${name}" test file was not found: ${filepath}`)
+  }
+}
+
+async function loadEvalConfig(configPath: string, name: string): Promise<EvalConfig> {
+  const configModule = (await import(configPath)) as {default?: unknown}
+  if (!isEvalConfig(configModule.default)) {
+    throw new Error(`Eval "${name}" config must export a default config with a prompt`)
+  }


+function isEvalConfig(value: unknown): value is EvalConfig {
+  return (
+    value !== null &&
+    typeof value === 'object' &&
+    'prompt' in value &&
+    typeof (value as Record<string, unknown>).prompt === 'string'
+  )
+}


Copilot AI added 2 commits June 11, 2026 16:17

feat: Support inline eval configs

d21f903

fix: Repair CI validation issues

1143b51

Copilot AI assigned Copilot and joshblack Jun 11, 2026

Copilot created this pull request from a session on behalf of joshblack June 11, 2026 16:23 View session

Copilot finished work on behalf of joshblack June 11, 2026 16:23

Copilot AI requested a review from joshblack June 11, 2026 16:23

joshblack reviewed Jun 30, 2026

View reviewed changes

Copilot started work on behalf of joshblack June 30, 2026 15:03 View session

fix: Align inline evals with eval directory structure

4449adb

Copilot finished work on behalf of joshblack June 30, 2026 15:08

Copilot AI requested a review from joshblack June 30, 2026 15:08

Copilot started work on behalf of joshblack June 30, 2026 15:09 View session

Copilot AI added 2 commits June 30, 2026 15:11

Merge remote-tracking branch 'origin/main' into copilot/define-evals-…

d76ea88

…inline-in-experiment # Conflicts: # packages/experiment/src/config.ts

fix: Repair merged pnpm lockfile

ef8c140

Copilot finished work on behalf of joshblack June 30, 2026 15:15

joshblack reviewed Jun 30, 2026

View reviewed changes

Copilot started work on behalf of joshblack June 30, 2026 16:13 View session

fix: Default inline eval name to directory

5cdc3b5

Copilot AI requested a review from joshblack June 30, 2026 16:16

Copilot finished work on behalf of joshblack June 30, 2026 16:16

joshblack marked this pull request as ready for review June 30, 2026 16:23

Copilot AI review requested due to automatic review settings June 30, 2026 16:23

joshblack merged commit 5dc2ed6 into main Jun 30, 2026
7 checks passed

joshblack deleted the copilot/define-evals-inline-in-experiment branch June 30, 2026 16:23

Copilot stopped reviewing on behalf of joshblack due to an error June 30, 2026 16:24
Failed to launch agent

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support inline eval definitions#43

feat: Support inline eval definitions#43
joshblack merged 6 commits into
mainfrom
copilot/define-evals-inline-in-experiment

Copilot AI commented Jun 11, 2026 •

edited

Loading

Uh oh!

joshblack left a comment

Uh oh!

Copilot AI commented Jun 30, 2026

Uh oh!

joshblack commented Jun 30, 2026

Uh oh!

Copilot AI commented Jun 30, 2026

Uh oh!

joshblack Jun 30, 2026

Uh oh!

Copilot AI Jun 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshblack left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Jun 30, 2026

Uh oh!

joshblack commented Jun 30, 2026

Uh oh!

Copilot AI commented Jun 30, 2026

Uh oh!

joshblack Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 11, 2026 •

edited

Loading